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Data science promises new insights, helping 
transform information into knowledge that 
can drive science and industry. 
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Realizing 
the Potential 
of Data Science 


the ability to manipulate and understand data is 
increasingly critical to discovery and innovation. As 
a result, we see the emergence of a new field—data 
science—that focuses on the processes and systems 
that enable us to extract knowledge or insight from data 
in various forms and translate it into action. In practice, 
data science has evolved as an interdisciplinary field 


that integrates approaches from such 
data-analysis fields as statistics, data 
mining, and predictive analytics and 
incorporates advances in scalable 
computing and data management. But 
as a discipline, data science is only in 
its infancy. 

The challenge of developing data sci¬ 
ence in a way that achieves its full poten¬ 
tial raises important questions for the 
research and education community: 
How can we evolve the field of data sci¬ 
ence so it supports the increasing role 
of data in all spheres? How do we train 
a workforce of professionals who can 
use data to its best advantage? What 
should we teach them? What can gov¬ 
ernment agencies do to help maximize 
the potential of data science to drive 
discovery and address current and fu¬ 


ture needs for a workforce with data 
science expertise? Convened by the 
Computer and Information Science 
and Engineering (CISE) Directorate of 

| key insights 

■ Data science can help connect previously 
disparate disciplines, communities, 

and users to provide richer and deeper 
insights into current and future challenges. 

■ Data science encompasses a broad set of 
areas, including data-focused algorithmic 
innovation and machine learning; data 
mining and the use of data for discovery; 
collection, organization, stewardship and 
preservation of data; privacy challenges 
and policy associated with data; and 
pedagogy to support the education and 
training of data-savvy professionals. 

■ There is a growing gap between commercial 
and academic research practice for data 
systems that needs to be addressed. 


APRIL 2018 I VOL. 61 I NO. 4 COMMUNICATIONS OF THE ACM 67 










contributed articles 


the U.S. National Science Foundation 
as a Working Group on the Emergence 
of Data Science (https://www.nsf.gov/ 
dir/index.jsp?org=CISE), we present 
a perspective on these questions with 
a particular focus on the challenges 
and opportunities for R&D agencies to 
support and nurture the growth and 
impact of data science. For the full re¬ 
port on which this article is based, see 
Berman et al. 2 

The importance and opportunities 
inherent in data science are clear 
(see http://cra.org/data-science/). If 
the National Science Foundation, work¬ 
ing with other agencies, foundations, 
and industry can help foster the evolution 
and development of data science and data 
scientists over the next decade, our re¬ 
search community will be better able 
to meet the potential of data science 
to drive new discovery and innovation 
and help transform the information 
age into the knowledge age. We hope 
this article serves as a basis for dialogue 
within the academic community, the 
industrial research community, and 
ACM and relevant ACM special interest 
groups (such as SIGKDD and SIGHPC). 

The Data Life Cycle 

Data never exists in a vacuum. Like a bi¬ 
ological organism, data has a life cycle, 
from birth through an active life to “im¬ 
mortality” or some form of expiration. 
Also like a living and intelligent or¬ 
ganism, it survives in an environment 
that provides physical support, social 


context, and existential meaning. The 
data life cycle is critical to understand¬ 
ing the opportunities and challenges 
of making the most of digital data; see 
the figure here for the essential compo¬ 
nents of the data life cycle. 

As an example of the data life cycle, 
consider data representing experimen¬ 
tal outputs of the Large Hadron Collider 
(LHC), an instrument of tremendous im¬ 
portance to the physics community and 
supported by researchers and nations 
worldwide. LHC experiments collide 
particles to test the predictions of vari¬ 
ous theories of particle physics and high- 
energy physics. In 2012, data on LHC ex¬ 
periments provided strong evidence for 
the Higgs Boson, supporting the veracity 
of the Standard Model of Physics. This 
scientific discovery was Science Maga¬ 
zine’s 2012 “Breakthrough of the Year” 3 
and Nobel Prize for Physics in 2013. 

The life cycle of LHC data is fasci¬ 
nating. At “birth,” data represents the 
results of collisions within an instru¬ 
ment carried out in a 17-mile tunnel on 
the France-Switzerland border. Most of 
the data generated is technically “un¬ 
interesting” and disposed of, but a tre¬ 
mendous amount of “interesting” data 
remains to be analyzed and preserved. 
Estimates are that by 2040, there will be 
from 10 exabytes to 100 exabytes (bil¬ 
lion trillion bytes) of “interesting” data 
produced by the LHC. Retained LHC 
data is annotated, prepared for pres¬ 
ervation, and archived at more than a 
dozen physical sites. It is published 


and disseminated to the community 
for analysis and use at more than 100 
other research sites. Critical attention 
to stewardship, use, and dissemina¬ 
tion of LHC data throughout its life 
cycle has played a key role in enabling 
the scientific breakthroughs that have 
come from the experiments. 

In addition to development of data 
stewardship, dissemination, and use 
protocols, the LHC data ecosystem 
also provides an economic model that 
sustainably supports the data and its 
infrastructure. It is the combination 
of this greater ecosystem, community 
agreements about how the data is orga¬ 
nized, and political and economic sup¬ 
port that allow LHC data to meet its po¬ 
tential to transform our knowledge of 
physics and enable scientists to make 
the most of the tremendous invest¬ 
ment being made in the LHC’s physical 
instruments and facilities. 

The data life cycle diagram outlined 
in the figure and the LHC example 
suggest a seamless set of actions and 
transformations on data, but in many 
scientific communities and disci¬ 
plines today these steps are isolated. 
Domain scientists focus on generating 
and using data. Computer scientists 
often focus on platform and perfor¬ 
mance issues, including mining, or¬ 
ganizing, modeling, and visualizing, 
as well as the mechanisms for eliciting 
meaning from the data through ma¬ 
chine learning and other approaches. 
The physical processes of acquisition 
and instrument control are often the 
focus of engineering, or data as “dirty 
signals” or as control inputs for other 
equipment. Statisticians may focus 
on the mathematics of models for risk 
and inference. Information scientists 
and library scientists may focus on 
stewardship and preservation of data 
and the “back-end” of the pipeline, 
following acquisition, decisions, and 
action in the realm of publishing, ar¬ 
chiving, and curation. 

There is a significant opportunity 
for bridging gaps in development of ef¬ 
fective life cycles for valuable data with¬ 
in and among the computer science, 
information science, domain, and 
physical science and engineering com¬ 
munities, for a start. There is also an 
opportunity for bridging gaps among 
machine learning, data analytics, and 
related disciplines (such as statistics 


The data life cycle and surrounding data ecosystem from the Realizing the Potential of 
Data Science Report . 2 
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Teaching Data Science: 
Many Flowers Blooming 

To support research and workforce development for data science, we must determine 
how—and, interestingly, where in the institution—it should be taught. Much as the 
emergence of computer science in the 1960s created the first organizational units and 
degrees dedicated to computing in modern universities, the rise of data science is 
driving a range of interesting curricular experiments. For a sense of the rapidly evolving 
landscape, consider these five: 

University of California. The University of California, Berkeley, Data Science Education 
Program 8 is part of its recently established Division of Data Sciences, at the same 
level as Berkeley’s colleges and schools, integrating with them. The introductory 
class provides a foundation for students in all fields to engage with data and creates 
pathways to advanced-level work. The foundational course combines instruction 
in core computational and statistics concepts while enabling students to work with 
real data in a range of fields. It is designed to be accessible to undergraduates of any 
intended major without prior experience. A set of connector courses (mostly taken 
simultaneously with the foundational course) enable them to apply core skills from the 
foundational course to real-world issues that relate to their areas of interest. There are 
also advanced courses, including an upper-division integrative course called “Data 100 
Principles and Techniques of Data Science.” 

University of Michigan. The University of Michigan Undergraduate Program in Data 
Science 11 is a new major offered as a joint program by the Electrical Engineering and 
Computer Science and Statistics Departments. The data science major is a rigorous 
program focusing on aspects of computer science, statistics, and mathematics relevant 
for analyzing and manipulating large datasets. It can be entered from either the College 
of Engineering or the College of Literature, Science, and Arts. 

Columbia University. The Columbia University Data Science Institute’s Master of 
Science in Data Science 4 offers a professional master’s degree to students with any 
undergraduate degree that includes suitable quantitative prior coursework. It starts 
from a set of four foundational courses (that can be taken independently to yield a data 
science certificate), focusing on algorithms, probability/statistics, machine learning, 
and visualization. 

University of Illinois. The University of Illinois at Urbana-Champaign Master of 
Computer Science in Data Science degree 10 is offered as an online professional 
master’s in computer science available on the Coursera massive open online course 
platform. 5 The degree seeks to create a global gateway into the discipline. The program 
builds expertise in four core areas of computer science—data visualization, machine 
learning, data mining, and cloud computing—and also offers courses in collaboration 
with the university’s Statistics Department and School of Information Sciences. 

This collaboration specifically strives to cover the full data life cycle, including its 
mathematical, computational, and curation and stewardship components, in an 
integrated and comprehensive fashion. 

University of Chicago. The University of Chicago Master of Science in Computational 
Analysis and Public Policy program 9 is offered jointly by the Department of Computer 
Science and the Harris School of Public Policy. As government decision making is 
increasingly data-driven, data use, data sharing, transparency, and accountability become 
increasingly important issues from both a public policy and a technological perspective. 
The program focuses on the intersection of policy and computer science. Students take 
courses across both areas, preparing them to make meaningful contributions to the 
design, implementation, and rigorous analysis of policies in the public sector. 


and operations research). Here we fo¬ 
cus on some opportunities. 

National Data Science Research 

Almost every stage of the data lifecycle, 
as outlined in the figure, provides deep 
research opportunities. Moreover, an 
overarching area of opportunity for a na¬ 
tional data science agenda is to bridge 
the gaps in the life cycle, building stron¬ 
ger connections among the computer 
science, infonnation science, statistics, 
domain, and physical science and engineer¬ 
ing communities, as outlined earlier. That 
is, a business-as-usual research agenda is 
likely to strengthen individual technolo¬ 
gies behind discrete steps in the data life 
cycle but unlikely to nurture broader 
breakthroughs or paradigm shifts that 
cut across existing disciplinary silos. It 
is an essential and defining attribute of 
data (“big” and otherwise) that it can 
connect previously disparate disci¬ 
plines, communities, and users to pro¬ 
vide richer and deeper insight into cur¬ 
rent and future challenges. 

It is vital to encourage a broader and 
more holistic view of data as integrat¬ 
ing research opportunities across the 
sciences, engineering, and range of ap¬ 
plication domains. One such opportu¬ 
nity is to invest in the full data life cycle 
and surrounding environment—as a 
central outcome itself, not as a side 
effect or intermediate step to another 
desirable outcome. In parallel with 
development of data science in depth 
as a core component of computer sci¬ 
ence, data science should also evolve 
in breadth to address the needs of do¬ 
mains outside computer science. Our 
community has a unique opportunity 
to advance data science, with respect 
to applying data-driven strategies to 
individual domain research and cross¬ 
domain research opportunities. 

A second opportunity involves what 
might be called “embodied intelligence” 
scenarios that big data is enabling for 
the first time. Recent breakthroughs in 
a range of foundational artificial intelli¬ 
gence and “deep learning” technologies 1 
have made it possible to create sophis¬ 
ticated software artifacts that “act in¬ 
telligently.” The key innovations are in 
mathematical-pattern-recognition tech¬ 
niques that take input from millions of 
training examples of correct responses to 
create software systems (soon likely hard¬ 
ware systems as well) able to better recog¬ 


nize images, decode human speech, dis¬ 
cover critical patterns in legal and 
business documents, and more. As engi¬ 
neered artifacts, these artificial intelli¬ 
gence systems are embodied as complex 
mathematical formulae that are custom¬ 
ized to purpose, or “trained,” by a truly 
astounding volume of numerical param¬ 
eters (such as 10 million for a decent 
image-classification system today). 

These trained decision-oriented 
models are becoming core components 
in a range of novel software solutions to 


complex problems, creating cross-disci¬ 
plinary challenges. 6 For example, what 
does it mean for such a component to be 
“correct” when it is perhaps only 70% ac¬ 
curate? What should the life cycle be for 
the data used to train and update these 
models? What are the policy implica¬ 
tions (and designation of responsibility) 
for embodied intelligent agents trained 
on such data that behave with negative 
consequences (such as when blamed for 
an autonomous vehicle that crashes, or 
by a customer whose account is suspend- 
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ed inappropriately based on an automat¬ 
ic inference)? Software engineering, as a 
discipline, is challenged by such impre¬ 
cision and with versioning and testing of 
the enormous data components—giga- 
byte-to-terabyte scale training data—for 
these systems. Existing notions of model 
verification/validation seem woefully in¬ 
sufficient. And the policy, stewardship, 
and curation questions go largely un¬ 
asked and unanswered. 

Note that the existence of predictive 
models is not unique to machine learn¬ 
ing; for example, statistical models 
have been used in epidemiology, and 
physical models are common in weath¬ 
er prediction and nuclear simulations. 
The “training” aspect for data science 
may be novel in the context of the soft¬ 
ware engineering of solutions, in that 
the resulting models may lack the guar¬ 
antees associated with statistical power 
and sample-size calculations. 

Yet another opportunity is to ad¬ 
dress the growing gap between com¬ 
mercial and academic research prac¬ 
tice for data systems at the edge of the 
state of the art. Much has been made 
of the increasing “reverse migration” 
of strong academic researchers into 
data-rich enterprises (such as Face- 
book, Google, and Microsoft). While 
this is likely good for the U.S. national 
economy in the near term, it is worri¬ 
some for the future of discovery-based 
open research, education, and training 
in the academic sector. In addition to 
the challenges of attracting sponsored 
research funding, another reason for 
the “brain drain” from the research 
community into the private sector may 
be declining infrastructure-support 
environments, including the sparsity 
of large datasets and adequate infra¬ 
structure in academia that support 
data science research at scale. When 
the best infrastructure environment 
for cutting-edge research is consistent¬ 
ly in the private sector, the opportunity 
for innovation in the public sector de¬ 
teriorates. Government support for 
strategic and committed public-pri¬ 
vate partnerships that build adequate 
and representative at-scale infrastruc¬ 
ture in the academic community for 
researchers can unlock innovation in 
academic research and ultimately sup¬ 
port the private sector through devel¬ 
opment of a more sophisticated, edu¬ 
cated, better-trained workforce. 


National Data Science 
Education and Training 

Higher-education institutions across 
the U.S. recognize that data science is 
a critical skill for 21 st -century research 
and a 21 st -century workforce. In higher 
education, data science curricula have 
two audiences: new professionals in 
data science, and scientists and profes¬ 
sionals who need data science skills to 
contribute to other fields. Data science 
curricula in higher education often 
focus on both, the same way curricula 
in computer science departments edu¬ 
cate computer science students and 
provide training in computer skills to 
students from other disciplines to pro¬ 
mote computer literacy. 

It is important to note that, at pres¬ 
ent, there is no single model of which 
department, school, or cross-unit col¬ 
laboration within higher-education in¬ 
stitutions should have the responsibility 
for data science education and training. 
Data science programs are being sited 
in departments and schools of com¬ 
puter science, information science, sta¬ 
tistics, and management. Many of the 
most successful, particularly at the un¬ 
dergraduate level, represent university¬ 
wide coalitions frequently sponsored 
by interdisciplinary institutes, rather 
than by a particular department or 
school. There is thus no common agree¬ 
ment as to where data science should 
“live” in the institution, though there 
is much interesting experimentation 
at this point (see the sidebar “Teaching 
Data Science” for several programmatic 
configurations). Note that when a uni¬ 
versity chooses to house “data science” 
in an existing department or college, 
it implicitly adopts the standards and 
culture of that existing organization. In 
contrast, when a university introduces 
“data science” as an interdisciplinary 
function, it confronts the heterogeneity 
of the new field up front but will likely 
deal with additional administrative 
overhead associated with a cross-orga¬ 
nizational entity. We focus on trends in 
both data science education and train¬ 
ing in the following paragraphs. 

Educational curricula in data science 
have yet to “standardize” and appear to¬ 
day with many interesting course con¬ 
figurations. In general, data scientists 
are expected to be able to analyze large 
datasets using statistical techniques, 
so statistics and modeling are typically 


part of required coursework. Moreover, 
a comprehensive data science curricu¬ 
lum is more than machine learning and 
statistics, possibly including courses on 
programming, data stewardship, and 
ethics, in addition to other areas. Data 
scientists must be able to find meaning 
in unstructured data, so classes on pro¬ 
gramming, data mining, and machine 
learning are often part of the core. Data 
scientists must also be able to com¬ 
municate their findings effectively, so 
courses on visualization maybe offered, 
at least as an elective. In recognition of 
the challenges that arise from misuse of 
data and incorrect conclusions drawn 
from data, ethics is also becoming a part 
of responsible curricula for the field. 

Other courses that appear either 
in the core or as an elective in various 
programs include research design, da¬ 
tabases, algorithms, parallel comput¬ 
ing, and cloud computing, all of which 
reflect skills an employer might expect 
from a data scientist. Many programs 
also require a capstone project that 
gives students experience in working 
through real-world problems in teams 
in a particular domain. Data science 
courses are also becoming a staple of 
quality online programs. 

A strong data science curriculum 
requires faculty with appropriate ex¬ 
pertise and engagement with the field. 
The pull of faculty with expertise in 
data science and related fields away 
from academia and toward industry 
creates a challenge for educational in¬ 
stitutions in mounting such programs. 
It also presents a potential challenge to 
development of data science as a for¬ 
mal discipline. 

To combat this trend, the Moore 
and Sloan Foundations in 2013 created 
a joint $38 million project, the Moore- 
Sloan Data Science Environments, to 
fund initiatives to create “data science 
environments,” 7 addressing challeng¬ 
es in academic careers, education and 
training, tools and software, reproduc¬ 
ibility and open science, physical and 
intellectual space, and data science 
studies. This funding has been trans¬ 
formational, providing critical “worked 
examples” of data science programs 
useful for current and future efforts. 

From the current diversity of curricu¬ 
la and programs, data science is going 
through an important and healthy pe¬ 
riod of experimentation. It is important 
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that we do not “standardize” data sci¬ 
ence too quickly, continuing to explore 
configurations of courses, areas, proj¬ 
ects, faculty, and partnerships to gain 
critical experience in how to best edu¬ 
cate new generations of data scientists. 

In addition to “data science” pro¬ 
grams and majors that serve to evolve 
data science as a discipline, data sci¬ 
ence skills are increasingly critical 
as training for other disciplines and 
professions as they become more and 
more data-enabled. Effective training 
will empower data-enabled profes¬ 
sionals and domain scientists to utilize 
data effectively and operate within a 
broader data-driven environment, de¬ 
velop an appreciation of what data can 
tell us and what it cannot, acquire ap¬ 
propriate technical knowledge about 
how data should be handled, gain 
awareness that correlation in data does 
not necessarily imply causality, and be¬ 
gin to develop a sense of responsible 
methodologies and ethical principles 
in the use of data. 

More specific training in the nuts and 
bolts of dealing with data is also criti¬ 
cal for various data-driven professions. 
Training in programming and software 
engineering is useful for students who 
will be using data-driven simulations 
and models in their research. Training 
in version control and the subtleties of 
stewardship, including working with re¬ 
positories for data and software, should 
be taught to computational research¬ 
ers. And training in best practices for 
digital scholarship and reproducibil¬ 
ity should be integrated into research- 
methodology curricula. The ethics of 
using (and misusing) data should be in¬ 
corporated into all training programs to 
promote effective and responsible data 
use. Courses teaching these skills can 
be made available in a variety of venues, 
from university courses and modules to 
online courses to professional courses 
that could be developed by scientific so¬ 
cieties and communities. 

Data Science Research and 
Education infrastructure 

Any innovative agenda in data science 
research and education will depend 
on a foundation of enabling data in¬ 
frastructure and useful datasets. Re¬ 
search in data science needs access to 
sufficiently large and numerous data¬ 
sets to illuminate and validate results. 


The datasets must be available for re¬ 
producible research and hosted by reli¬ 
able infrastructure. 

Lack of such infrastructure and da¬ 
tasets will inhibit success. Education 
and training in data science is most au¬ 
thentic in a setting where students can 
work on data that represents the da¬ 
tasets and environments they will see 
in the professional arena; that is, data 
that is both at-scale and embedded in 
a stewardship infrastructure that en¬ 
ables it to be a useful tool in analysis, 
modeling, and mining. 

In the best case, data infrastruc¬ 
ture should support access to data for 
research and education that is equiva¬ 
lent to access to any other key utility; 
it must be “always on,” it must be ro¬ 
bust enough to support extensive use, 
and the quality must be good. In the 
world of data, this comes down to 
responsible stewardship, meaning 
there must be actors, plans, and both 
“social” and technical infrastructure 
to ensure the following: 

Data is appropriately tracked, moni¬ 
tored, and identified. Who created, cu¬ 
rated, and used the data? Can it be per¬ 
sistently identified? Are there adequate 
privacy and security controls?; 

Data is well cared for. Who is com¬ 
mitted to keeping it, in what formats, 
and for how long? Who is committed 
to funding data stewardship? And how 
will it be stored and migrated to next- 
generation media?; 

Data is discoverable and useful. How 
is data made available and to whom? 
What services are needed to make good 
use of it? And what metadata and other 
information is needed to promote re¬ 
producibility?; and 

Data stewardship is compliant with 
policy and good practice. Does steward¬ 
ship comply with community standards 
and appropriate policy regarding re¬ 
porting, intellectual property, and other 
concerns? Are the rights, licenses, and 
other properties that will determine ap¬ 
propriate use clear? And what data and 
metadata are to be kept, who owns it 
and its by-products, and who has access 
to it and its metadata or parts of it? 

Since data will become the core for 
research and insight for a broad set of 
academic disciplines, access to it in a 
usable form on a reasonable time scale 
becomes the entry point for any effec¬ 
tive research and education agenda. 



At present, 
there is 

no single model 
of which 
department, 
school, 
or cross-unit 
collaboration 
within higher- 
education 
institutions 
should have 
the responsibility 
for data science 
education 
and training. 
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Government R&D agencies (such as the 
National Science Foundation) have an 
opportunity to ensure the lack of ad¬ 
equate data infrastructure does not 
present a roadblock to innovative re¬ 
search and educational programs. 

Developing and sustaining the in¬ 
frastructure that ensures that research 
data is available to the public and ac¬ 
cessible for reuse and reproducibil¬ 
ity requires stable economic models. 
While there is much support for the 
development of tools, technologies, 
building blocks, and data-commons 
approaches, few U.S. federal programs 
directly address the resource challeng¬ 
es for data stewardship or provide help 
for libraries, domain repositories, and 
other stewardship environments to be¬ 
come self-sustaining and address the 
need for public access. 

While the U.S. federal government 
cannot take on the entire responsibility 
for stewardship of sponsored research 
data and its infrastmcture, neither 
should it shy away from providing seed 
or transition funding for institutions and 
organizations to develop sustainable 
stewardship options for the national 
community. We encourage the commu¬ 
nity, inside and outside of government, 
to support the development and piloting 
of sustainable data stewardship models 
for data-driven research and data science 
education through strategic programs, 
guidance, and cross-agency and public- 
private partnerships. Science-centric 
government agencies like the National 
Science Foundation should coordinate 
with peer agencies like the National In¬ 
stitutes of Health that focus on similar is¬ 
sues to leverage investments and provide 
economies of scope and scale. 

Realizing the Potential 

The research, education, and infra¬ 
structure discussions here focus on 
developing a foundation that can in¬ 
crease the pool of data scientists and 
data-literate professionals to meet the 
current and near-term challenges of 
data-driven efforts in all sectors, as well 
as the need to evolve data science as a 
discipline that can meet the challenges 
of future data-driven scenarios. 

Data is everywhere, providing an in¬ 
creasingly important tool for a broad 
spectrum of endeavors. As systems grow 
“smarter” and take on more autono¬ 
mous and decision-making capabilities, 


we will increasingly face data science 
technical challenges and the social chal¬ 
lenges of governance, ethics, policy, and 
privacy. Addressing them will be critical 
to rendering data-driven systems useful, 
effective, and productive, rather than in¬ 
trusive, limiting, and destructive. Such 
solutions will be particularly important 
in highly data-driven environments like 
the Internet of Things. Moreover, as 
fundamental computational platforms 
change in response to the looming end 
of Moore’s Law scaling of semiconduc¬ 
tors, 12 there will be tremendous oppor¬ 
tunities to reimagine the entire hard¬ 
ware/software enterprise in the light of 
future data needs. 

Conclusion 

Our community must be prepared to 
deal with future scenarios by encour¬ 
aging the initial research that lays the 
groundwork for innovative uses of data, 
well-functioning data-focused systems, 
useful policy and protections, and ef¬ 
fective governance of data-driven envi¬ 
ronments. With both programmatic re¬ 
sources and a platform for community 
leadership, federal R&D agencies like 
the National Science Foundation play 
an important role in guiding the com¬ 
munity toward innovation. Attention 
to deep efforts needed to expand the 
field and its impact, as well as broad ef¬ 
forts to help data science reach its po¬ 
tential for transforming 21 s *-century 
research, education, commerce, and 
life, are needed. 
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